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Introduction 

Polyamides  have  been  shown  to  inhibit  binding  of  transcription  factors  to  specific  DNA 
sequences,  and  thus  can  be  considered  as  candidate  therapeutic  agents  to  regulate  gene 
expression,  Pyrrole-Imidazole  (Py-Im)  containing  polyamide  molecules  can  be  designed  to 
recognize  dsDNA  minor  groove  with  high  affinity  and  sequence  specificity,  comparable  to 

affinity  and  specificity  of  gene  transcription  factors  In  addition  to  Pyrrole  and 
Immidazole  aromatic  rings  and  their  modifications,  polyamide  chains  may  contain  other 

“residues”  that  improve  polyamide-DNA  specificity  6,  interfere  with  binding  of  transcription 
factors7^,  or  enhance  cell  and  nuclear  membrane  permeability  of  the  polyamide  candidate 

drugs^.  These  common  polyamide  building  blocks,  pairing  riles  and  possible  topologies  are 
described  in  Figures  1-3  and  Table  I. 

In  this  project  we  design  highly  specific  polyamides  to  target  the  erbB2/Her2  promoter 
region,  thus  disrupting  formation  of  the  transcription  complex  and  inhibiting  expression  of 

this  important  oncogene.  The  first  generation  of  anti-erbB2  polyamide  inhibitors^,  binding 
DNA  sequences  in  TATAA  box  proximity,  have  been  shown  to  effectively  inhibit  expression 
erbB2  gene  in  cell-free  expression  systems.  However,  the  7-base  pair  sequence  of 
polyamide-DNA  binding  site  used  in  this  initial  studies  is  too  short  and  repeats  itself  ~106 
times  in  human  genome,  questioning  safety  and  efficacy  of  the  candidate  drug  based  on  these 
polyamide  constructs. 

The  major  goal  of  our  study  is  to  rationally  select  longer  (12-16  bp)  dsDNA  targets  in 
erbB2  promoter  to  achieve  maximum  whole  genome  specificity  and  to  design  optimal 
polyamide  binders  to  these  regulatory  sites. 


Body 

Taskl:  Optimization  of  target  sequences  in  gene  Her2/erbB-2  promoter. 

The  sequence  of  the  erbB2  gene  promoter  contains  well-characterized  TATAA  and 
CCAAT  boxes,  repetitive  GGA  motif  and  putative  SP1  binding  sequences  in  the  region 
upstream  to  the  major  transcription  start  site,  see  Figure  4.  Despite  TATA  presence,  multiple 
transcription  start  sites  have  been  found,  the  major  ones  being  21  and  70  bp  down  from  the 
TATA  box.  It  was  shown  that  the  500bp  region  upstream  of  the  major  starting  site  is 
sufficient  for  both  basal  and  inducible  transcription  activity,  the  most  proximal  125bp  DNA 

stretch  being  responsible  for  about  30-fold  overexpression  in  most  cancer  cell  lines  1 1 . 
a.  List  all  short  (8-16  bp)  sequences,  flanking  TATA .  CAAT  and  GC  boxes  in  Her2 
promoter. 

We  performed  a  comprehensive  database  analysis,  based  on  the  specialized  Matlnspector 
tool  12,13s  t0  fm(j  putative  regulatory  elements  in  the  500  bp  promoter  of  erbB2.  Table  II 
lists  the  results  of  this  search  for  the  most  important  150  bp  proximal  region.  Most  sites, 
found  and  characterized  previously,  were  identified  in  the  search  (these  entries  are 
emphasized  both  in  Table  II  and  Figure  4).  For  example,  the  ETS  response  element  next  to 
the  TATAA  box ^0,1 1_  as  well  as  AP-2  binding  site^  CCAAT  box,  were  identified. 

Based  on  the  analysis  presented  in  Table  n  we  selected  6  short  16  bp  sequences,  flanking 

4 


4T  * 


transcription  factor  binding  sites,  see  Figure  4.  Note  that  four  of  these  sequences  overlap 
with  more  that  one  major  activation  site,  which  makes  them  the  most  interesting  targets  for 
antigene  therapy. 

b.  Rate  specificity  of  the  listed  sequences  in  the  human  genome. 

Published  human  genome  sequence  gives  us  an  opportunity  to  predict  the  specificity  of  a 
polyamide  binder  on  a  whole  genome  level.  We  have  designed  a  specialized  program  to 
perform  exhaustive  BLAST-based  searches  in  the  human  genome  draft  to  assign  sequence 
specificity  of  a  particular  binding  pattern.  We  performed  both  searches  for  exact  sequence 
matches,  as  well  as  a  simple  sequence  profile  search  with  low  penalty  for  A-T  substitution. 
The  latter  approach  was  devised  to  take  into  account  full  degeneracy  of  Py-Py  recognition  of 
A-T  pair  and  partial  degeneracy  of  Pyrrole-Hydrohypyrrole  (Py-Hp)  recognition  of  A-T. 

Using  this  program,  we  assigned  the  specificity  to  all  possible  1 1, 12, 13  and  14  bp  fragments 
within  preselected  target  sequences.  Figure  5  demonstrates  an  example  result  of  our  analysis 
in  the  case  of  13  bp  fragments. 

c.  Rate  conservation  of  listed  sequences  using  several  versions  of  the  promoter. 

Conservation  of  the  target  sequence  is  critical  for  development  of  effective  antigene 
inhibitors.  Analysis  of  the  6  available  versions  of  the  erbB2  promoter  sequences  from 
different  sources  have  demonstrated  good  sequence  conservation  in  the  chosen  proximate 
region  from  -150  to  0,  while  more  deletions-insertions  are  possible  in  the  farther  upstream 
sequence.  In  the  proximate  region,  the  erbB2  promoter  sequence  can  contain  gaps  in 
positions  -135  and  -122  and  an  A->T  mutation  in  position  -69,  corresponding  sites  are 
shown  in  Figure  5  in  red. 

d.  Sort  the  list  of  target  sequences. 

We  sorted  all  the  short  fragments  within  highlighted  sites  based  on  the  sequence 
specificity  score,  length  and  overlap  with  core  activation  sites.  This  analysis  has  produced 
several  nontrivial  insights.  First,  we  found  that  the  whole  region  around  the  TATAA  box, 
which  is  very  important  for  regulation  of  gene  activity,  has  very  poor  specificity  in  the 

human  genome^.  In  addition,  sequence  6  is  very  AT  rich,  which  further  lowers  its 
polyamide  specificity  score.  On  the  other  hand,  sequences  1, 2,  and  4  contain  13  bp 
fragments  with  almost  unique  whole-genome  specificity,  and  each  of  them  overlap  with  more 
than  one  activation  site. 

As  a  result  of  the  above  sequence  analysis  of  erbB2  promoter  performed  in  the  Task  1,  the 
sequences  in  Table  n  have  been  chosen  as  optimal  targets  for  polyamide  design,  see 
Table  III  The  most  promising  target  is  the  DNA  sequence  4,  which  overlaps  with  2 
important  regulatory  sites  of  erbB2/Her2  promoter,  is  almost  unique  in  the  genome,  does  not 
have  documented  variations  in  the  sequence,  and  also  have  a  low  AT  content,  benefitial  for 
polyamide  recognition  specificity. 


Task  2:  Overall  design  and  evaluation  of  complimentary  polyamides. 

a.  For  each  target  sequence  generate  a  set  of  polyamide  molecules  using  DNA-nolvamide 
recognition  code  and  a  choice  of  additional  blocks. 

Using  a  set  of  polyamide  elements  and  polyamide-DNA  pairing  rules  15>16,  summarized  in 
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Table  1,  we  have  devised  an  algorithm  to  build  all  matching  polyamide  sequences  for  each 
target  dsDNA  site.  The  algorithm  starts  by  building  a  “perfect  match”  sequence  that  contains 
Py,  Im  and  Hp  rings  only  and  performs  all  possible  substitutions  and  connections  to  allow 
various  types  of  topology  suggested  in  the  proposal.  Additional  empirical  rules  are  also 
applied  to  eliminate  unfeasible  designs,  e.g,  only  2  to  4  successive  rings  are  allowed,  P- 
alanines  are  isolated,  only  4  y-links  are  allowed,  and  so  on.  With  these  restrictions  applied, 
the  program  automatically  generates  as  many  as  -30-50  different  polyamide  sequences  for 
each  13  bp  DNA  sequence  or  -20-30  polyamides  for  12  bp  DNA.  We  performed  this 
procedure  with  the  best  50  DNA  targets  from  our  target  list  and  stored  the  resulting  1285 
“sequences”  of  polyamide-DNA  complex  in  a  specialized  database. 


b.  Check  feasibility  of  chemical  synthesis  for  designed  compounds. 

Polyamide  chains,  containing  various  combinations  of  Imidazole  (Im),  Pyrrole  (Py), 
Hydroxypyrrole  (Hp)  rings,  3-alanin,  y-linkers,  and  many  other  building  blocks  can  be 
produced  by  Boc  solid  phase  chemistry  using  standard  protocols,  described  in  works  from 
Peter  Dervan’s  laboratory^*  1 2-2 1  Recently,  Fmoc  solid  phase  chemistry  have  been  also 
introduced  for  a  machine-assisted  synthesis  of  Im-Py  polyamides^,  as  well  as  oxime  resin 
chemistry,  which  allows  extension  of  the  polyamide  C-terminal  tails  repertoire^. 

In  our  design  we  utilize  a  standard  set  of  residues  and  overall  topologies,  with  proven 
chemical  feasibility.  While  some  designs  here  may  be  preferred  over  others,  currently  no 
theoretical  limitations  have  been  found  on  chemical  feasibility  of  polyamides  in  our  database. 

c.  Make  preliminary  estimations  for  affinity  and  specificity  of  each  compound. 

The  central  part  of  our  project  is  3D  modeling  of  the  resulting  DNA-polyamide  complexes 
and  evaluation  of  their  relative  affinity.  Our  original  algorithm  uses  die  fact  that  polyamide 
complexes  with  DNA  are  very  modular  in  structure.  This  allows  us  to  build  initial 
conformations  of  new  complexes,  based  on  known  X-ray  geometries  of  previously 

characterized  complexes24-26  The  program  tethers  DNA  and  ligand  residues  to  the 
respective  residues  in  the  X-ray  structure.  These  initial  conformations  are  subsequently 
optimized  by  restrained  energy  minimization,  where  energy  terms  include  bonded,  van  der 
Waals,  electrostatic  and  hydrogen  bonding  terms.  The  application  of  geometry  restraints 
enforces  DNA-DNA  base-pairing  and  DNA-polyamide  pairing  rules  in  the  initial  stage  of  the 
optimization,  forcing  the  model  to  follow  the  “canonical”  pattern  of  polyamide-DNA 
recognition .  In  the  final  stage,  the  restraints  are  removed  and  free  global  energy 
minimization  is  applied.  The  deviation  between  restrained  and  free  energy  minimized  models 
is  usually  within  all-atom  RMSD  <  1.5  A  for  “match”  polyamide-DNA  complexes,  which 
suggest  high  quality  of  the  modeling.  Single  polyamide  mismatches  increase  this  RMSD  to 
-2-3  A,  thus  reflecting  big  deviations  of  the  folly  energy-optimized  model  from  the 
“canonical”  recognition  pattern. 

The  polyamide-DNA  binding  energy  of  the  models  ms  estimated  in  terms  of  van  der 
Waals,  hydrogen  bonding,  electrostatic  and  solvation  contributions.  Comparison  with  more 
than  50  published  measurements  for  short  polyamide  hairpins  estimates  the  accuracy  of 
relative  binding  energy  predictions  at  about  1.7  kcal/mol.  This  polyamide-DNA  modeling 
algorithm  was  presented  at  the  Program  in  Mathematics  and  Molecular  Biology  meeting. 


6 


Task  3:  Detailed  modeling  and  selection  of  candidate  structures 

a.  Test  and  adjust  the  ICM  global  minimization  procedure  with  published  polvamide-DNA 
complexes. 

The  polyamide  modeling  algorithm  was  further  upgraded  to  accommodate  new  variants 
of  polyamide  topology  and  improve  affinity  estimations  by  using  a  more  accurate  molecular 
force  field.  We  have  also  adjusted  the  procedure  for  automated  3-D  modeling  of  polyamide- 
DNA  complexes  to  making  conformational  and  binding  energy  predictions  more  robust  for 
longer  complexes  with  new  design  elements. 

The  first  improvement  deals  with  the  choice  of  starting  configurations  of  the  complex  and 
polyamide  placement.  The  new  algorithm  uses  standard  B-DNA  as  initial  conformation,  and 
places  the  polyamide  chain  into  the  DNA  minor  groove  according  to  the  specified 
polyamide-DNA  pairing  rules.  Only  then  the  special  distance  constraints,  provided  by  the 
available  polyamide-DNA  X-ray  structures  are  employed  in  the  energy  optimization  of  the 
complex.  These  modifications  help  to  avoid  strong  deviations  from  B-DNA  structure  in  the 
initial  steps  of  the  procedure  and  provide  much  faster  and  more  reliable  convergence  for 
energy  minimizations. 

The  other  improvement  takes  advantage  of  the  new  internal  coordinate  force  field  (ICFF) 
developed  in  the  lab^A  The  ICFF  is  automatically  generated  from  a  “source”  Cartesian  force 
field  (such  as  MMFF94s  or  Amber)  with  an  algorithm  that  “projects”  Cartesian  parameters 
into  the  torsion  coordinate  space.  Implicit  flexibility,  naturally  incorporated  into  the  torsion 
energy  parameters,  is  critical  to  the  accuracy  of  the  internal  coordinates  model  with  fixed 
covalent  geometry.  Essential  also  is  the  ability  of  ICFF  method  to  generate  fixed  covalent 
geometries  for  new  chemical  structures,  using  Cartesian  geometry  minimization  with  the 
source  force  field.  This  feature  facilitates  inclusion  of  new  elements  into  our  custom 
polyamide  residue  library,  producing  fixed  residue  geometries  compatible  with  the  new 
torsion  force  field.  Direct  modifications  (i.e.  aromatic  ring  to  P-alanine  replacement)  in 
polyamide  chain  sequence  are  now  allowed  through  fast  local  geometry  optimization  in 
Cartesian  coordinates,  followed  by  internal  coordinate  global  optimization. 

Prediction  accuracy  of  the  new  algorithm  with  ICFF  geometries  and  energy  functions 
substantially  improved  compared  to  the  previous  version  with  ECEPP  torsion  potential, 
reducing  geometry  RMSD  from  ~1 .2  A  to  just  ~0.9  A  in  our  standard  comparison  test  with  of 
available  PDB  entries  (365d  and  334d).  Binding  free  energy  estimations  with  the  new 
algorithm  also  improved  from  1.7  kcal  to  1.3  kcal  RMSD. 

Prediction  power  of  our  polyamide-DNA  modeling  algorithm  was  also  evaluated  in  NMR 

structural  study,  performed  in  collaboration  with  Dr.  Wemmer  groups.  A  conformational 
model  of  10-ring  hairpin-DNA  complex,  derived  by  our  algorithm  ab-initio  was  found  to  be 
in  excellent  agreement  with  the  corresponding  NMR  model,  built  with  NOESY  distance 
constraints,  RMSD  <  1  A  (see  the  poster  presentation  attached). 

b.  Build  all-atom  models  for  DNA  complexes  with  newly  designed  polyamides 

The  automated  procedure  for  polyamide  design  was  programmed  with  ICM  molecular 
modeling  package,  which  takes  DNA  sequences  and  coded  polyamide  sequences  as  input, 
and  produces  energy  optimized  complexes  in  the  output.  An  example  of  the  program  input 
and  output  are  shown  in  Figure  6. 

The  program  reads  the  input  sequence  where  each  DNA  and  polyamide  “residue”  is 
represented  with  one  letter  or  symbol.  Double  stranded  DNA  is  built  in  a  standard  energy 
optimized  B-form  by  an  original  ICM  script.  A  polyamide  chain  of  specific  sequence  (or  two 
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chains  in  case  of  overlapping  hairpin  topology)  is  built  from  the  library  of  residues.  The 
pairing  between  polyamide  residues  and  DNA  residues  is  assigned  according  to  the  input. 

One  or  more  X-ray  templates  are  then  superimposed  onto  the  DNA  structure  to  cover  the 
polyamide  binding  site,  and  the  polyamide  atoms  are  “tethered”  to  the  corresponding 
polyamide  atoms  in  the  templates. 

Tight  binding  of  polyamides  in  the  DNA  minor  groove  and  the  modular  nature  of  the 
pairing  between  the  molecules  suggest  special  approach  to  energy  minimization  of  the 
complex.  We  apply  so-called  ICM  “regularization”  procedure  to  minimize  both  length  of  the 
“tethers”  and  the  conformational  energy  of  the  object.  Regularization  procedure  goes  through 
several  iteration  steps,  using  different  weight  ratio  for  conformational  energy  and  “tether 
tension”  energy  at  each  minimization  step.  The  weight  of  the  tethers  in  the  energy  function 
gradually  decreases  throughout  the  regularization  procedure,  making  the  final  solution 
virtually  independent  on  the  tethers.  Minimizations,  performed  in  torsion  coordinates,  not 
only  guarantee  fast  convergence  of  this  procedure,  but  also  prevent  severe  deformations  in 
covalent  geometry  due  to  the  tether  tension  in  the  initial  steps  of  the  procedure.  Spatial 
positions  of  the  templates  are  readjusted  in  the  course  of  the  regularization  procedure  to 
allow  large-scale  movement  of  DNA  backbone.  This  annealing-like  algorithm  is  designed  to 
generate  low-energy  structures  with  high  local  similarity  to  the  templates. 

For  each  of  the  three  selected  16-bp  DNA  targets,  we  generated  more  than  100  polyamide 
“perfect  match”  complexes  with  12-bp  DNA  recognition  sites,  which  differ  in  positions  of  5- 
member  rings  in  the  sequence  or  in  overall  topology.  We  use  several  criteria  to  check  the 
quality  of  the  models  built.  First,  we  check  the  length  of  hydrogen  bond  contacts  between 
polyamide  and  DNA  residues,  which  are  expected  by  the  pairing  rules.  For  the  best  models 
we  found  up  to  93%  of  the  of  the  34  hydrogen  bonds  within  2.5  A  lengths  (measured  as 
hydrogen  to  heavy  atom  distance),  while  on  the  average  about  89%  of  the  H-bonds  satisfy 
this  criteria  for  the  “perfect  match”  models.  Second,  we  check  the  tethers  between  the  model 
and  the  template,  and  found  that  the  average  length  of  the  tethers  is  about  0.5  A  and  usually 
do  not  exceed  1 .5  A.  Finally,  we  performed  10  independent  runs  with  single  mismatches  in 
the  polyamide  sequences  and  found  the  consistent  increase  in  the  complex  conformational 
energy  compared  to  the  perfect  match  case. 

A  new  important  polyamide  residue,  iV-diaminoalkylpyrrole,  have  been  added  recently  to 

the  polyamide  design  repertoire  A  Polyamides  with  diaminoalkyl  “positive  patch”  not  only 
allow  reliable  inhibition  of  transcription  factors  with  exclusive  major  groove  binding,  e.g. 
bZIP  proteins,  but  also  improve  affinity  and  specificity  of  DNA  recognition.  Thus,  using 
alkylpyrrole  positive  patch  in  combination  with  C-terminal  N-methylamide  as  a  “tail”  we 
might  be  able  to  improve  polyamide  gene  inhibitors  in  many  cases  (Figure  7).  We  designed 
and  optimized  geometry  of  new  A-diaminoalkylpyrrole,  A-diaminoalkylimidazol  and  N- 
methylamide  residues,  and  incorporated  them  into  the  library  of  polyamide  elements, 
c.  Calculate  global  minimum  conformations  for  each  complex  and  evaluate  polvamide-DNA 
binding  energy. 

The  annealing  procedure,  employed  in  the  global  energy  optimization  of  the  complex  is 
described  above.  We  performed  a  separate  study  with  three  polyamide-DNA  complexes  to 
assess  global  convergence  of  energy  optimizations  in  our  special  case.  For  each  model  we 
used  20  independent  runs  of  the  procedure  with  different  annealing  schedules.  In  all  the  three 
cases  we  found  slight  variability  in  the  results  of  different  runs,  with  the  average 
conformational  energy  RMSD  ~0.7  kcal  and  geometry  RMSD-0.9  A.  Such  conformational 
variability  is  expected  in  the  polyamide-DNA  complexes,  and  has  to  be  taken  into  account  by 
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averaging  results  over  several  independent  runs. 

Much  more  flexible  aminoalkyl  and  C-terminal  methylamide  moieties  of  polyamides  were 
treated  separately  with  the  ICM  Monte  Carlo  global  optimization  method  to  allow  large-scale 
changes  in  their  conformations.  ICM  allows  freezing  of  the  variables  in  the  rest  of  the 
complex,  which  makes  exhaustive  Monte  Carlo  search  in  the  flexible  parts  of  the  molecule 
possible  on  a  reasonable  time  scale.  We  found  this  Monte-Carlo  search  critical  to  avoid  local 
minima  trapping  of  the  flexible  parts  of  the  polyamide  molecule. 

Polyamide-DNA  binding  energy  for  a  given  conformation  of  the  complex  was  predicted  as 
a  sum  of  hydrogen  bonding,  van  der  Waals  and  electrostatic  interactions  energies  between 
polyamide  and  DNA,  combined  with  different  weights  (1.,  0.43  and  0.75  respectively).  This 
binding  energy  formula  was  previously  found  to  be  optimal  by  calibration  with  shorter 

polyamides^.  For  each  polyamide-DNA  complex,  the  binding  energy  was  calculated  as  an 
average  of  binding  energies  of  five  independently  minimized  conformations.  Binding  energy 
results  for  the  best  polyamide  binders  to  the  erbB2  promoter  sequence  4  are  presented  in 
Table  IV.  Note,  that  affinity  of  the  “tandem  hairpin”  design  in  our  predictions  is 
consistently  better,  compared  to  single-molecule  topologies,  i.e.  soft  hairpin  and  cyclic 
chains.  These  results  can  be  explained  by  somewhat  higher  conformational  flexibility  of  the 
tandem  hairpin  topology,  as  well  as  better  affinity  of  newly  discovered  optimal  short  tails  to 

the  G*C  base  pair^3  =  NH(CH2)20H  tail,  “~”=  NH(CHs)  tail).  Also,  our  results  confirm 
that  the  novel  positively  charged  diaminoalkyl  extensions  tend  to  improve  overall  DNA 
binding  affinity  of  polyamides  in  addition  to  their  role  in  enhancing  interference  with  the 
gene  transcription^. 

To  represent  diversity  of  the  polyamide  topology,  five  best  “tandem  haipins”,  three  “soft 
haipins”  and  two  “cyclic  polyamides”  in  Table  IV  have  been  selected  for  as  lead  erbB2 
inhibitors  for  future  investigations.  Structure  of  the  best  tandem  hairpin  complex  is  presented 

in  Figure  8. 

Task  4.  in  vitro  and  in  vivo  testing 

a.  Test  designed  polyamide  compounds  in  vitro  for  their  DNA  sequence  specificity  and 
ability  to  block  transcription  factors  binding  to  erbB2/Her2  promoter. 

b.  Test  these  compounds  for  their  efficacy  in  human  breast  cancer  cell  cultures. 

The  experimental  testing  is  not  budgeted  in  the  current  grant  and  is  expected  to  be  performed 
through  an  academic  collaboration.  Recently  published  data  indicate  that  with  the  exception 
of  certain  T-cell  lines,  polyamide-dye  conjugates  tend  to  localize  mainly  in  the  cytoplasm, 

but  not  in  the  nucleus  of  live  cells  9>30  Specifically,  the  study  from  Peter  Dervan’s  group 

arrived  to  the  conclusion  that  previously  designed  8-ring  polyamides^,  though  very  strong 
erbB2  inhibitors  in  cell-free  expression  systems,  may  be  not  effective  against  breast  cancer 

cell  lines  due  to  their  inability  to  access  nuclear  DNA^.  These  new  circumstances  make  our 
potential  collaborators  to  postpone  synthesis  and  testing  of  novel  anti-erbB2  polyamides  until 
the  problem  of  cell  nucleus  delivery  of  polyamides  is  solved. 

Several  groups  are  currently  working  on  possibility  to  design  new  generation  of  polyamide¬ 
like  molecules  with  improved  nuclear  localization^!  and  we  plan  to  provide  our  expertise 
in  computer-assisted  polyamide  design  to  these  groups  to  facilitate  development  of 
polyamide  conjugates  with  nuclear  localization,  without  sacrificing  their  DNA  binding 
affinity  and  specificity. 
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Key  Research  Accomplishments 

-  found  the  most  important  candidate  targets  for  antigene  therapy  within  the  proximal 
erbB2  promoter 

-  estimated  the  whole-genome  specificity  of  all  possible  short  fragments  within  this 
promoter  region 

-  designed  an  automatic  algorithm  to  list  all  possible  polyamide  topologies  matching  a 
given  DNA  sequence 

-  written  a  program,  generating  initial  3D  models  of  a  polyamide-DNA  complex  from 
its  “sequence”,  based  on  the  known  pattern  of  polyamide-DNA  recognition  and 
global  energy  optimization  in  torsion  coordinates 

-  employed  a  novel  accurate  force  field  (ICFF)  in  the  modeling  algorithm,  making 
feasible  reliable  calculations  for  longer  polyamide-DNA  complexes  and  facilitating 
new  design  topologies 

-  benchmarked  and  optimized  our  predictions  of  polyamide-DNA  binding  affinity, 
using  available  experimental  data 

-  tested  the  quality  of  our  3D  models  in  a  joint  modeling-NMR  study  of  10  ring 
polyamide  hairpins,  complexed  with  DNA 

included  new  aminoalkyl-modified  residues  in  the  polyamide  residue  library, 
improving  both  affinity  and  inhibitory  effect  of  the  designed  polyamides 

-  generated  all-atom  models  for  more  than  300  polyamides  complexed  with  DNA 
targets  in  erbB2  gene  promoter 

-  predicted  binging  energy  of  these  polyamides  and  selected  most  potent  polyamide 
designs  for  further  experimental  studies 

Reportable  outcomes 

•  Programs  and  algorithms: 

o  PolyVar  program  to  generate  possible  polyamide  sequences  for  a  given  DNA 
recognition  site. 

o  PolyGroove©  program  for  fast  3D  modeling  of  polyamide-DNA  complexes  from  the 
corresponding  residue  sequences  and  subsequent  binding  affinity  predictions 
(requires  ICM-pro  package). 

o 

•  Meeting  Presentation  and  Abstracts: 

o  Katitch,  V.,  Abagyan,  R.A.  and  Olson,  W.K.  (1999).  Structural  Modeling  of 

Polyamide-DNA  Recognition.  Mathematics  and  Molecular  Biology  VI,  Santa  Fe,  NM 

o  M.  Totrov,  V.  Katritch,  D.  Pilch,*  W.K.  Olson,*  J.  Femandez-Recio,  R.  Abagyan, 
Flexible  Docking  (2000).  The  Scripps  Research  Institute  Scientific  report.  La  Jolla, 
CA. 

o  Bernhard  H,  Geierstanger,  Colin  J.  Loweth,  Vsevold  Katritch,  Ruben  Abagyan,  Peter 
G.  Schultz  &  David  E.  Wemmer  (2001).  NOE  distance  constraints  and  structural 
modeling  of  a  ten-ring  hairpin  complex  with  DNA.  Frontiers  ofNMR  and  Molecular 
Biology  Meeting,  Keystone,  CO. 

o  V sevold  Katritch,  Juan  Fernandez  Recio  and  Ruben  Abagyan  (2002)  Targeting  of 
erbB2/Her2  DNA  with  polyamides.  Era  of  Hope  Department  Of  Defense 
(DOD)Breast  Cancer  Research  Program  (BCRP)  meeting.  Sept  24-28,  Orlando,  FL. 
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•  Articles: 

o  Vsevolod  Katritch,  Maxim  Totrov  and  Ruben  Abagyan  (2002).  ICFF:  A  new  method 
to  incorporate  implicit  flexibility  into  an  internal  coordinate  force  field.  J.  of  Comp. 
Chem.  in  press. 

o  The  modularity  of  DNA  recognition  by  polyamide  molecules  persists  for  a  ten-ring 
hairpin  in  complex  with  an  eight  base  pair  binding  site.  Bernhard  H,  Geierstanger, 
Colin  J.  Loweth,  Vsevold  Katritch,  Ruben  Abagyan,  Peter  G.  Schultz  &  David  E. 
Wemmer.  (2002)  Submitted  to  J.  of  Am.  Chem.  Soc. 

Conclusions 

In  this  project,  we  have  identified  the  best  candidate  dsDNA  targets  for  polyamide  binding 
within  the  most  important  proximal  region  of  the  erbB2  promoter  sequence  and  sorted  them 
according  to  their  whole-genome  specificity  and  overlap  with  transcription  activation  sites. 
Using  an  extended  set  of  binding  blocks,  choice  of  topology  variants  and  an  original 
automated  procedure,  we  have  listed  chemically  feasible  polyamides  matching  the  target 
dsDNA  sequences,  according  to  the  polyamide-DNA  pairing  rules.  We  have  developed  a  fast 
and  reliable  algorithm  to  build  3D  models  of  these  polyamides-DNA  complexes,  based  on 
the  known  modular  structure  of  the  complexes  and  all-atom  conformational  energy 
minimization.  The  accuracy  of  our  structural  modeling  were  confirmed  by  experimental 
NOESY  distance  constraints,  and  binding  energy  predictions  were  extensively  benchmarked 
with  available  data  on  short  polyamide  hairpin-DNA  affinity. 

Using  these  algorithms,  we  have  build  more  than  300  polyamide-DNA  models  targeting  12 
and  13  base  pair  recognition  sites  within  the  three  selected  erbB2  promoter  targets.  Analysis 
of  polyamides  DNA  hydrogen  bonding  pattern  and  energy  strain  in  the  complex  suggests  that 
even  for  such  extended  complexes  all  specific  polyamide-DNA  contacts  can  be 
conformationally  afforded,  if  we  use  optimal  polyamide  chain  topologies,  with  no  more  than 
4  aromatic  rings  in  a  row.  Also  our  modeling  suggests  that  diaminoalkyl  group  conjugated  to 
an  aromatic  residue  not  only  extend  the  molecule  into  DNA  major  groove  but  also  can 
substantially  improve  polyamide-DNA  binding  affinity.  Binding  energy  evaluations  allowed 
selection  of  the  best  candidates  for  each  of  the  3  best  topologies,  including  tandem  hairpins, 
soft  hairpins  and  cyclic  chains. 

The  10  chosen  polyamide  structures  are  expected  to  have  high  binding  affinity  and  whole 
genome  specificity  to  the  erbB2  promoter  DNA  and  can  be  considered  as  highly  specific 
erbB2  inhibitors  with  potential  anti-cancer  activity.  Further  development  of  these  lead 
candidates  for  breast  cancer  drug  requires  optimization  of  nuclear  membrane  permeability  of 
polyamide-like  molecules  and  further  study  of  pharmacokinetic  features  of  polyamides. 
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Table  L  Polyamide-DNA  pairing  rules.  Along  with  Pyrrole  (P),  Imidazole  (I)  and 
Hydrohypyrrole  (H)  rings,  other  elements  include  P -alanine,  which  can  stack  with  any  ring  or 
with  itself  to  provide  some  flexibility,  as  well  as  two  types  of  y-links,  used  as  flexible 
“connectors”  linking  opposite  polyamide  strands. 


|Name  of  family/matn 

x  Further  Information 

Position 

*  5*'^.  &|l| 

pm 

Core] 

sim.  | 

r  ii 

IVSSPIF/GC  01 

-148:-135 

mm 

0.876 

0.790  gctgGGAGttgccg  || 

IVSLYMF/TH1E47  01 

|Th(ng1/E47  heterodimer 

-134;-119 

mm 

1.000  I 

l»fej[>i  SiHISu^iSWHi 

11HH1 

-120:-103 

mm 

1.000  I 

WAMam'MMimm 

-113:-105 

mm 

0.819 

mm 

1.000  | 

JV$PCAT/CAAT  01 

Icellular  and  viral  CCAAT  box 

-82;-71 

■SI 

Toool 

Inuclear  factor  Y(Y-box 

Ibinding  factor) 

-82:-67 

(+) 

1.000 

0.920  IjtgctcCCAAtcacagg 

-69:-55 

mm 

1.000  i 

IMJ  KWH 

-57:-43  I 

■51 

1.000  ] 

0.892  aggtggagGAGGagg 

-61:40 

mm 

0.857  1 

0.772  agCCCTcctcct 

iissEHEaai 

-36:-22 

mm 

1.000  1 

0.910  tgaGGAAgtataaga 

|V$TBPF/TATA  C 

IlRetroviral  TATA  box 

-30;-21 

mm 

un 

0.779  agTATAAGAa 

—M 

mmm 

1.000 

|V$NOLF/OLF1 01 

11  SISSSfflTSWSiH??  5HISS55!1! 

-1:-20 

1.000 

Table  n.  Results  of  Matlnspector  analysis  for  600  bp  promoter  fragment  containing  the 
major  transcriptional  start  site  (position  0),  CCAAT  and  TATAA  boxes,  ETS  response 
element  and  other  potential  targets  for  antigene  therapy. 


No 

DNA  Sequence 

Regulatory  elements 

1 

ilTTGCCGACTCpayS 

GC  box  element  and  Thingl/E47  heterodimer 

2 

cttcgttggaatgca| 

c-Myb 

4 

|^CGCGCTTGCTC|C 

COMF1  and  CCAAT  box  ^ 

Table  in.  erbB2  promoter  sites  selected  for  polyamide  targeting.  Regulatory  elements, 
possibly  involved  in  erbB2  activation  are  highlighted.  Documented  single  nucleotide 
polymorphism  (SNP)  sites  are  shown  in  red. 
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Input  sequence 

Topology  type 

Predicted  binding  energy, 
kcal 

GAGCGCGCTTGCTCCC 

IPXbIP-hIK 

4-  + 

PIp-PIbPPI 

CTCGCGCGAACGAGCC 

Tandem  soft  hairpins, 
with  g-NH3+  linkers, 
with  diaminoalkyl  group 

23.2*0.9 

"GAGCGCGCTTGCTCCC 
IPIbIP-pIK 
+  + 

PIp-PIbPPI 

CTCGCGCGAACGAGCC 

Tandem  soft  hairpins, 
with  g-NHj+ linkers, 
with  diaminoalkyl  group 

21.5*1.4 

GAGCGCGCTTGCTCCC 
IPIbIP-pIK 
+  + 

PIp-PIbPPI 

CTCGCGCGAACGAGCC 

Tandem  soft  hairpins, 
with  g-NH3+  linkers, 
with  diaminoalkyl  group 

-21. .3*1 .6 

GAGCGCGCTTGCTCCC 
IFI-iPbPIK 
+  + 

PIPbPi-PPI 

CTCGCGCGAACGAGCC 

Tandem  soft  hairpins, 
with  g-NH3+ linkers, 
with  diaminoalkyl  group 

-121.0*1.2 

GAGCGCGCTTGCTCCC 

IPI-iPbHIP 

+  + 

PIPbPi-PFI 

CTCGCGCGAACGAGCC 

Tandem  soft  hairpins, 
with  g-NH3+  linkers, 
no  diaminoalkyl  group 

-2G.1±1,6 

GAGCGCGCTTGCTCCC 

iPIbIPbPIK 

1 

-PIPbPIbPPI 

CTCGCGCGAACGAGCC 

Soft  hairpin, 
with  NH(CH3)  tail  and 
with  diaminoalkyl  group 

-19.4*2.5 

GAGCGCGCTTGCTCCC 

IPIPbPPbIP- 

I 

PXPIbIPbPi 

CTCGCGCGAACGAGCC 

Soft  hairpin  (reverse  strand), 
with  NH(CH3)  tail  and 
no  diaminoalkyl  group 

-18.5*1.7 

GAGCGCGCTTGCTCCC 

iPIPIPbPIK 

+ 

-IPPIPIbPPI 

CTCGCGCGAACGAGCC 

Soft  hairpin, 
with  (CH2)2OH  tail  and 
with  diaminoalkyl  group 

-18.2*  1.7 

GAGCGCGCTTGCTCCC 

iPIbIPbPIK 

4-  *f 

IPbIPIbPPI 

CTCGCGCGAACGAGCC 

Soft  cyclic, 

with  g-  and  g-NH3+ linkers, 
with  diaminoalkyl  group 

-17.3*1.4 

GAGCGCGCTTGCTCCC 

IPIblFbHXK 
+  1 

PIPbPIbPPI 

CTCGCGCGAACGAGCC 

Soft  cyclic, 

with  g-  and  g-NH3+  linkers, 
with  diaminoalkyl  group 

-16.9*1.5 

Table  IV.  Top  ten  suggested  polyamide  binders  to  the  erbB2  promoter  target  sequence  4. 
Accuracy  of  the  energy  predictions  was  assessed  by  five  independent  annealing 
minimizations.  One-letter  codes  for  polyamide  residues  are:  “P”-  pyrrole,  “I”-  Imidazole, 
“H”-  hydroxypyrrole,  K-  diaminoalkylpyrrole,  R-  diaminoalkylimidazole,  “b”-  {5-alanine, 

“  I  y-linker,  “+”  -  y-NH3+ linker,  |5-DP  tail,  “-“-NH(CH2)2OH  tail,  “~”-NH(CH3)  tail23 
The  second  polyamide  molecule  is  colored  red. 
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=  Pyrrole (P)  (h)  =  Hydro  x^pyrroie  (H)  fTV  Imidazole  (I) 


Pairing  diagram 
5f  -T  G  G  T  C  A-3f 


3'  -  A  C  C  A  G  T  -  5* 


Figure  3.  Structural  basis  of  polyamide-DNA  recognition.  Hydrogen  bonds,  required  for 
binding  specificity  of  Pyrrole  (P),  Imidazole  (I)  and  Hydroxypyrrole  (H)are  shown  as  dashed 
lines.  Also  shown  standard  diagram  presentation  of  the  complex. 


1 


2 


3 


-150>  AGCTGC 


>  4 


-100> 


-50  > 


ij^^^^^AATCACA  GGAGAAGGAGGAGGTGGAG 

6 


Figure  4,  Sequence  of  the  proximal  region  of  erbB2  promoter.  Predicted  core  activation 
sites  are  underscored  and  experimentally  confirmed  sites  are  shown  in  bold.  Also  the 

arrows  show  two  palindromic  sequences'^  involved  in  transcription  activation.  We  have 
highlighted  and  numbered  16-bp  sequences,  chosen  as  putative  targets  for  further  analysis. 


Specificity  of  13  bp  fragments  in  erbB2  promoter 


Figure  5.  Whole-genome  specificity  analysis  for  1 3  bp  fragments  of  the  proximal  erbB2 
promoter  sequence.  Note  that  the  most  rare  fragments  correspond  to  sequences  1, 2  and 
4  respectively  (see  Figure  1),  while  fragments  5  and  6  flanking  TATA  box  have  very  poor 
whole-genome  specificity,  comparable  to  the  specificity  of  the  control  fragment  with  a  GGA 
repeat  (the  white  bar). 
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INPUT  SCRIPT: 

# ! /home/ sevak/icm2/icmL 
call  _s  tar tup 

call  _PolyGroove  ##  Polyamide  modeling  tools  in  ICM  scripting  language 
#aregul  (template_obj) (DNA_seq) (Polyamide_seq) <i_start) (n_steps) (l_di splay) (l_freeMin) 
aregul  "hpl_template.ob"  "GGGAGCGCGCTTGCTCCCA"  ”IPI-iPbPIP+IPP-iPbPIP+"  5  100  no  no 
quit 

OUTPUT  FILE: 

GGGAGCGCGCTTGCTCCCA+ I P I - i PbP I P+ I PP- i PbP I P+ . ob 

#_summary  :  icmName  GGGAGCGCGCTTGCTCCCA+ I PI -i PbPIP+ IPP-i PbPI P+ 

#_summary  :  objCode  hpl_template.ob 

#_summary  :  nChains  4 

#_summary  :  chainList  watson  crick  a  b 

#_summary  :  nResidues  60 

#_surnmary  :  nFreeVar  322 

#_summary  :  vwCutoff  7.5 

#_summary  :  hbCutoff  3.0 

#_summary  :  electroMethod  distance  dependent 

#_summary  :  dielConst  4.0 

#_summary  :  surf aceMethod  atomic  solvation 

#_summary  :  eTotal  -1208.74 

#_summary  :  grad  290.42 

#_summary  :  eVacuum  -917.15 

#_summary  :  eNonEl  -695.44 

#_summary  :  e_vw  -751.66 

#_summary  :  e_hb  -79.73 

#_summary  :  e_to  135.95 

#_summary  :  e_el  -221.71 

#_summary  ;  eSolvat  -291.59 

#_summary  :  eEntropy  0.00 

#_summary  :  tzWeight  0.24 

#_summary  :  rmsd  1 . 00 

#_summary  :  rmsdBackbone  1 . 04 

#_summary  :  nTz  320 

#_summary  :  resNotTz  18 


Figure  6.  PolyGroove  input  and  output  files  for  one  of  the  DNA-polyamide  sequences, 
“connectors”  linking  opposite  polyamide  strands. 
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Figure  7.  N-diaminoalkylpyrrole  containing  polyamide  in  the  DNA  minor  groove.  This 
globally  optimized  conformation  shows  interaction  of  the  diaminoalkyl  tail  with  the  DNA 
phosphates,  which  ensures  inhibition  of  major-groove  binding  transcription  factors  by 
polyamides  of  this  type. 
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Figure  8.  Recognition  of  the  target  erbB2/Her2  DNA  sequence  4  by  the  8-ring  tandem 
hairpin  polyamide,  predicted  to  have  the  best  binding  energy  among  -300  polyamide  designs 
tested.  Pairing  diagram  is  shown  below: 

GAGCGCGCTTGCTCCC 

IPIbIP-hIK 
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